Serveur d'exploration sur la TEI

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Taking snapshots of the Web with a TEI camera

Identifieur interne : 000376 ( Main/Exploration ); précédent : 000375; suivant : 000377

Taking snapshots of the Web with a TEI camera

Auteurs : D. Walker [Canada]

Source :

RBID : Francis:524-99-12228

Descripteurs français

English descriptors

Abstract

Electronic texts are claimed to exhibit features distinct from their more tangible cousins. The Snapshot project aims to observe and capture language usage in an electronic medium by creating an open corpus of World Wide Web documents. These documents are re-encoded using the TEI guidelines to create a flexible, persistent and portable data repository. This report gives an overview of the decisions made with respect to the re-encoding of HTML documents, and with the structuring the overall corpus


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Taking snapshots of the Web with a TEI camera</title>
<author>
<name sortKey="Walker, D" sort="Walker, D" uniqKey="Walker D" first="D." last="Walker">D. Walker</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Computing and Information Science, Queen's University</s1>
<s2>Kingston, Ontario, K7L 3N6</s2>
<s3>CAN</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Canada</country>
<wicri:noRegion>Kingston, Ontario, K7L 3N6</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">524-99-12228</idno>
<date when="1999">1999</date>
<idno type="stanalyst">FRANCIS 524-99-12228 INIST</idno>
<idno type="RBID">Francis:524-99-12228</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000064</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000065</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000058</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000058</idno>
<idno type="wicri:doubleKey">0010-4817:1999:Walker D:taking:snapshots:of</idno>
<idno type="wicri:Area/Main/Merge">000403</idno>
<idno type="wicri:Area/Main/Curation">000376</idno>
<idno type="wicri:Area/Main/Exploration">000376</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Taking snapshots of the Web with a TEI camera</title>
<author>
<name sortKey="Walker, D" sort="Walker, D" uniqKey="Walker D" first="D." last="Walker">D. Walker</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Computing and Information Science, Queen's University</s1>
<s2>Kingston, Ontario, K7L 3N6</s2>
<s3>CAN</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Canada</country>
<wicri:noRegion>Kingston, Ontario, K7L 3N6</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Computers and the humanities</title>
<title level="j" type="abbreviated">Comput. humanit.</title>
<idno type="ISSN">0010-4817</idno>
<imprint>
<date when="1999">1999</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Computers and the humanities</title>
<title level="j" type="abbreviated">Comput. humanit.</title>
<idno type="ISSN">0010-4817</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Computational linguistics</term>
<term>Corpus linguistics</term>
<term>Description</term>
<term>Electronic text</term>
<term>Standardization</term>
<term>TEI</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Linguistique informatique</term>
<term>Texte électronique</term>
<term>Description</term>
<term>Standardisation</term>
<term>Usage linguistique</term>
<term>Encodage</term>
<term>Internet</term>
<term>TEI</term>
<term>Linguistique de corpus</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Electronic texts are claimed to exhibit features distinct from their more tangible cousins. The Snapshot project aims to observe and capture language usage in an electronic medium by creating an open corpus of World Wide Web documents. These documents are re-encoded using the TEI guidelines to create a flexible, persistent and portable data repository. This report gives an overview of the decisions made with respect to the re-encoding of HTML documents, and with the structuring the overall corpus</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Canada</li>
</country>
</list>
<tree>
<country name="Canada">
<noRegion>
<name sortKey="Walker, D" sort="Walker, D" uniqKey="Walker D" first="D." last="Walker">D. Walker</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000376 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000376 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Francis:524-99-12228
   |texte=   Taking snapshots of the Web with a TEI camera
}}

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024